Self-similarity Matrix
   HOME

TheInfoList



OR:

In
data analysis Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, enco ...
, the self-similarity matrix is a graphical representation of similar sequences in a data series. Similarity can be explained by different measures, like spatial distance (
distance matrix In mathematics, computer science and especially graph theory, a distance matrix is a square matrix (two-dimensional array) containing the distances, taken pairwise, between the elements of a set. Depending upon the application involved, the ''dist ...
),
correlation In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
, or comparison of local
histogram A histogram is an approximate representation of the distribution of numerical data. The term was first introduced by Karl Pearson. To construct a histogram, the first step is to " bin" (or "bucket") the range of values—that is, divide the ent ...
s or
spectral properties In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted ...
(e.g. IXEGRAM). This technique is also applied for the search of a given pattern in a long data series as in
gene matching In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a ba ...
. A similarity plot can be the starting point for dot plots or
recurrence plots In descriptive statistics and chaos theory, a recurrence plot (RP) is a plot showing, for each moment i in time, the times at which the state of a dynamical system returns to the previous state at i, i.e., when the phase space trajectory visits r ...
.


Definition

To construct a self-similarity matrix, one first transforms a data series into an ordered sequence of
feature vector In machine learning and pattern recognition, a feature is an individual measurable property or characteristic of a phenomenon. Choosing informative, discriminating and independent features is a crucial element of effective algorithms in pattern r ...
s V = (v_1, v_2, \ldots, v_n) , where each vector v_i describes the relevant features of a data series in a given local interval. Then the self-similarity matrix is formed by computing the similarity of pairs of feature vectors : S(j,k) = s(v_j, v_k) \quad j,k \in (1,\ldots,n) where s(v_j, v_k) is a function measuring the similarity of the two vectors, for instance, the
inner product In mathematics, an inner product space (or, rarely, a Hausdorff space, Hausdorff pre-Hilbert space) is a real vector space or a complex vector space with an operation (mathematics), operation called an inner product. The inner product of two ve ...
s(v_j, v_k) = v_j \cdot v_k. Then similar segments of feature vectors will show up as path of high similarity along diagonals of the matrix. Similarity plots are used for action recognition that is invariant to point of view and for audio segmentation using
spectral clustering In multivariate statistics, spectral clustering techniques make use of the spectrum (eigenvalues) of the similarity matrix of the data to perform dimensionality reduction before clustering in fewer dimensions. The similarity matrix is provided as ...
of the self-similarity matrix.


Example


See also

*
Recurrence plot In descriptive statistics and chaos theory, a recurrence plot (RP) is a plot showing, for each moment i in time, the times at which the state of a dynamical system returns to the previous state at i, i.e., when the phase space trajectory visits rou ...
*
Distance matrix In mathematics, computer science and especially graph theory, a distance matrix is a square matrix (two-dimensional array) containing the distances, taken pairwise, between the elements of a set. Depending upon the application involved, the ''dist ...
*
Similarity matrix In statistics and related fields, a similarity measure or similarity function or similarity metric is a real-valued function that quantifies the similarity between two objects. Although no single definition of a similarity exists, usually such meas ...
*
Substitution matrix In bioinformatics and evolutionary biology, a substitution matrix describes the frequency at which a character in a nucleotide sequence or a protein sequence changes to other character states over evolutionary time. The information is often in ...
*
Dot plot (bioinformatics) In bioinformatics a dot plot is a graphical method for comparing two biological sequences and identifying regions of close similarity after sequence alignment. It is a type of recurrence plot. History One way to visualize the similarity between ...


References


Further reading

* * * {{cite book , author=M. A. Casey , title=Sound Classification and Similarity Tools , publisher=J. Wiley , year=2002 , pages=309–323 , editor1=B.S. Manjunath , editor2=P. Salembier , editor3=T. Sikora , journal=Introduction to MPEG-7: Multimedia Content Description Language , isbn=978-0471486787


External links

* http://www.recurrence-plot.tk/related_methods.php Statistical charts and diagrams Visualization (graphics)